Information about the structure of the dataset is available at https://github.com/arquivo/pwa-technologies/wiki/Link-graph-dataset-definition

Data is compressed as a .tar.gz, with 300 files per collection, names part-XXX.tar.gz, from 0 to 299.

Extrated files are JSONL, sorted alphabethicaly by page SURT. Each file contains a random sample of the pages; there is no underlying order across files.

